Name: Ke Chen, Abhinav Bannerjee

UT EID: kc35827, ab45393

General Steps

  1. Acquire data source (ours was Top 2000 Companies as ranked by Forbes)
  2. Format data using ETL techniques for upload to Oracle SQL
  3. Analyze data and prepare visualizations in Tableau and RStudio (using ggplot2 and other packages)
  4. Create and publish Shiny web application with interactive visualizations

ETL Files

These are the two ETL files we used.

source("../01 Data/1_etl.R",print.eval=FALSE,echo = TRUE)
## 
## > setwd("/Users/CK/DV_Finalproject/01 Data")
## 
## > file_path <- "1.csv"
## 
## > df <- read.csv(file_path)
## 
## > names(df) <- gsub("\\.+", "_", names(df))
## 
## > str(df)
## 'data.frame':    2008 obs. of  10 variables:
##  $ Company     : Factor w/ 1999 levels "3i Group","3M",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ Country     : Factor w/ 62 levels "Argentina","Australia",..: 59 60 27 26 18 54 60 60 52 59 ...
##  $ State       : Factor w/ 40 levels "Alabama","Arizona",..: 35 20 35 35 35 35 11 11 35 35 ...
##  $ Industry    : Factor w/ 83 levels "Advertising",..: 52 23 71 36 80 23 64 64 25 52 ...
##  $ Rank        : int  1573 205 1616 1895 1360 201 248 303 1349 1755 ...
##  $ Year        : int  2015 2015 2015 2015 2015 2015 2015 2015 2015 2015 ...
##  $ Sales       : int  1200 31800 982 6700 1700 40100 21300 20000 9500 1900 ...
##  $ Profits     : int  1000 5000 178 84 420 2600 2300 1800 166 473 ...
##  $ Assets      : int  7300 31300 69200 14600 60000 45200 41300 27500 30600 8700 ...
##  $ Market_Value: int  7000 105100 2100 3400 2600 50300 69900 91700 3200 9300 ...
## 
## > measures <- c("Rank", "Year", "Sales", "Profits", 
## +     "Assets", "Market_Value")
## 
## > dimensions <- setdiff(names(df), measures)
## 
## > write.csv(format(df, scientific = NA), paste(gsub(".csv", 
## +     "", file_path), "..csv", sep = ""), row.names = FALSE, na = "")
## 
## > tableName <- gsub(" +", "_", gsub("[^A-z, 0-9, ]", 
## +     "", gsub(".csv", "", file_path)))
## 
## > sql <- paste("CREATE TABLE", tableName, "(\n-- Change table_name to the table name you want.\n")
## 
## > if (length(measures) > 1 || !is.na(dimensions)) {
## +     for (d in dimensions) {
## +         sql <- paste(sql, paste(d, "varchar2(4000),\n"))
## +     }
## + .... [TRUNCATED] 
## 
## > if (length(measures) > 1 || !is.na(measures)) {
## +     for (m in measures) {
## +         if (m != tail(measures, n = 1)) 
## +             sql <- paste(sq .... [TRUNCATED] 
## 
## > sql <- paste(sql, ");")
## 
## > cat(sql)
## CREATE TABLE 1 (
## -- Change table_name to the table name you want.
##  Company varchar2(4000),
##  Country varchar2(4000),
##  State varchar2(4000),
##  Industry varchar2(4000),
##  Rank number(38,4),
##  Year number(38,4),
##  Sales number(38,4),
##  Profits number(38,4),
##  Assets number(38,4),
##  Market_Value number(38,4)
##  );
source("../01 Data/2_etl.R",print.eval=FALSE,echo = TRUE)
## 
## > setwd("/Users/CK/DV_Finalproject/01 Data")
## 
## > file_path <- "2.csv"
## 
## > df <- read.csv(file_path)
## 
## > names(df) <- gsub("\\.+", "_", names(df))
## 
## > str(df)
## 'data.frame':    245 obs. of  2 variables:
##  $ CountryName  : Factor w/ 245 levels "Afghanistan",..: 199 201 77 163 2 146 182 220 242 1 ...
##  $ ContinentName: Factor w/ 9 levels "Africa","Antarctica",..: 1 2 2 3 6 4 7 4 1 3 ...
## 
## > measures <- c()
## 
## > dimensions <- setdiff(names(df), measures)
## 
## > write.csv(format(df, scientific = NA), paste(gsub(".csv", 
## +     "", file_path), "..csv", sep = ""), row.names = FALSE, na = "")
## 
## > tableName <- gsub(" +", "_", gsub("[^A-z, 0-9, ]", 
## +     "", gsub(".csv", "", file_path)))
## 
## > sql <- paste("CREATE TABLE", tableName, "(\n-- Change table_name to the table name you want.\n")
## 
## > if (length(measures) > 1 || !is.na(dimensions)) {
## +     for (d in dimensions) {
## +         sql <- paste(sql, paste(d, "varchar2(4000),\n"))
## +     }
## + .... [TRUNCATED] 
## 
## > sql <- paste(sql, ");")
## 
## > cat(sql)
## CREATE TABLE 2 (
## -- Change table_name to the table name you want.
##  CountryName varchar2(4000),
##  ContinentName varchar2(4000),
##  );

Dataframes

Here are the code for us to query dataset from SQL.

The include the summary of the data and also the head of 20 companies in the world.(Ascending in Rank.)

source("../01 Data/DATAFRAME1.R",echo = TRUE)
## 
## > require("jsonlite")
## 
## > require("RCurl")
## 
## > require("dplyr")
## 
## > df1 <- data.frame(fromJSON(getURL(URLencode("skipper.cs.utexas.edu:5001/rest/native/?query=\"select * from TOPCOMPANY \""), 
## +     httpheader = c(DB .... [TRUNCATED] 
## 
## > summary(df1)
##                           COMPANY               COUNTRY   
##  Akamai Technologies          :   2   United States :583  
##  Credito Valtellinese         :   2   Japan         :219  
##  Enagas                       :   2   China         :180  
##  Erie Indemnity               :   2   United Kingdom: 96  
##  Huntington Ingalls Industries:   2   South Korea   : 66  
##  ICAP                         :   2   France        : 61  
##  (Other)                      :1996   (Other)       :803  
##         STATE                             INDUSTRY         RANK       
##  Total     :1425   Regional Banks             : 272   Min.   :   1.0  
##  California:  77   Oil & Gas Operations       :  93   1st Qu.: 503.8  
##  New York  :  73   Real Estate                :  88   Median :1006.5  
##  Texas     :  57   Electric Utilities         :  87   Mean   :1004.7  
##  Illinois  :  37   Investment Services        :  87   3rd Qu.:1508.2  
##  New Jersey:  23   Telecommunications services:  66   Max.   :2000.0  
##  (Other)   : 316   (Other)                    :1315                   
##       YEAR          SALES           PROFITS            ASSETS    
##  Min.   :2015   Min.   :  -115   Min.   :-7500.0   7200   :  15  
##  1st Qu.:2015   1st Qu.:  4100   1st Qu.:  336.8   5200   :  13  
##  Median :2015   Median :  9300   Median :  637.5   14900  :  12  
##  Mean   :2015   Mean   : 19405   Mean   : 1449.0   10900  :  11  
##  3rd Qu.:2015   3rd Qu.: 18925   3rd Qu.: 1400.0   10300  :  10  
##  Max.   :2015   Max.   :485700   Max.   :77400.0   10700  :  10  
##                                                    (Other):1937  
##   MARKET_VALUE   
##  Min.   :    73  
##  1st Qu.:  6400  
##  Median : 12000  
##  Mean   : 23936  
##  3rd Qu.: 24625  
##  Max.   :741800  
##                  
## 
## > df1 <- df1 %>% arrange(RANK)
## 
## > head(df1, 20)
##                       COMPANY        COUNTRY       STATE
## 1                        ICBC          China       Total
## 2     China Construction Bank          China       Total
## 3  Agricultural Bank of China          China       Total
## 4               Bank of China          China       Total
## 5          Berkshire Hathaway  United States    Nebraska
## 6              JPMorgan Chase  United States    New York
## 7                 Exxon Mobil  United States       Texas
## 8                  PetroChina          China       Total
## 9            General Electric  United States Connecticut
## 10                Wells Fargo  United States  California
## 11               Toyota Motor          Japan       Total
## 12                      Apple  United States  California
## 13          Royal Dutch Shell    Netherlands       Total
## 14           Volkswagen Group        Germany       Total
## 15              HSBC Holdings United Kingdom       Total
## 16                    Chevron  United States  California
## 17            Wal-Mart Stores  United States    Arkansas
## 18        Samsung Electronics    South Korea       Total
## 19                  Citigroup  United States    New York
## 20               China Mobile          China       Total
##                       INDUSTRY RANK YEAR  SALES PROFITS  ASSETS
## 1                  Major Banks    1 2015 166800   44800 3322000
## 2               Regional Banks    2 2015 130500   37000 2698900
## 3               Regional Banks    3 2015 129200   29100 2574800
## 4                  Major Banks    4 2015 120300   27500 2458300
## 5          Investment Services    5 2015 194700   19900  534600
## 6                  Major Banks    6 2015  97800   21200 2593600
## 7         Oil & Gas Operations    7 2015 376200   32500  349500
## 8         Oil & Gas Operations    8 2015 333400   17400  387700
## 9                Conglomerates    9 2015 148500   15200  648300
## 10                 Major Banks   10 2015  90400   23100 1701400
## 11  Auto & Truck Manufacturers   11 2015 252200   19100  389700
## 12           Computer Hardware   12 2015 199400   44500  261900
## 13        Oil & Gas Operations   13 2015 420400   14900  353100
## 14  Auto & Truck Manufacturers   14 2015 268500   14400  425000
## 15                 Major Banks   15 2015  81100   13500 2634100
## 16        Oil & Gas Operations   16 2015 191800   19200  266000
## 17             Discount Stores   16 2015 485700   16400  203700
## 18              Semiconductors   18 2015 195900   21900  209600
## 19                 Major Banks   19 2015  93900    7200 1846000
## 20 Telecommunications services   20 2015 104100   17700  209000
##    MARKET_VALUE
## 1        278300
## 2        212900
## 3        189900
## 4        199100
## 5        354800
## 6        225500
## 7        357100
## 8        334600
## 9        253500
## 10       278300
## 11       239000
## 12       741800
## 13       195400
## 14       126000
## 15       167700
## 16       201000
## 17       261300
## 18       199400
## 19       156700
## 20       271500

As we can see in above. United States Japan, and China has the most of companies in the ranking list.And the profits’ range start from 7.5 Billion defecit to 77.4 B Surplus.

Also,in the top 20 Companies, There are 9 financial related Companies, which is almost 50 % in the top 20. And These 9 comapnies include 6 major banks, 2 regional banks and 1 investment Services. Moreover, There are 4 Oil $ Gas companies, which is about 20 % in the top 20.

source("../01 Data/DATAFRAME2.R",echo = TRUE)
## 
## > require("jsonlite")
## 
## > require("RCurl")
## 
## > df2 <- data.frame(fromJSON(getURL(URLencode("skipper.cs.utexas.edu:5001/rest/native/?query=\"select * from CONTINENT\""), 
## +     httpheader = c(DB = .... [TRUNCATED]

Shiny

RShiny App. Click Here!


The following two files are the code for Shiny.

source("../04 Shiny/ui.R",echo = TRUE)
## 
## > library(shiny)
## 
## > navbarPage(title = "Final Project", tabPanel(title = "Scatter Plot", 
## +     fluidRow(column(5), column(7, actionButton(inputId = "scatter", 
## +       .... [TRUNCATED] 
## <nav class="navbar navbar-default navbar-static-top" role="navigation">
##   <div class="container">
##     <div class="navbar-header">
##       <span class="navbar-brand">Final Project</span>
##     </div>
##     <ul class="nav navbar-nav">
##       <li class="active">
##         <a href="#tab-7778-1" data-toggle="tab" data-value="Scatter Plot">Scatter Plot</a>
##       </li>
##       <li>
##         <a href="#tab-7778-2" data-toggle="tab" data-value="Crosstab">Crosstab</a>
##       </li>
##       <li>
##         <a href="#tab-7778-3" data-toggle="tab" data-value="Bar Chart">Bar Chart</a>
##       </li>
##       <li>
##         <a href="#tab-7778-4" data-toggle="tab" data-value="Histogram">Histogram</a>
##       </li>
##       <li>
##         <a href="#tab-7778-5" data-toggle="tab" data-value="Pie Chart">Pie Chart</a>
##       </li>
##     </ul>
##   </div>
## </nav>
## <div class="container-fluid">
##   <div class="tab-content">
##     <div class="tab-pane active" data-value="Scatter Plot" id="tab-7778-1">
##       <div class="row">
##         <div class="col-sm-5"></div>
##         <div class="col-sm-7">
##           <button id="scatter" type="button" class="btn btn-default action-button">Generate Scatter Plot</button>
##         </div>
##       </div>
##       <div class="row">
##         <div class="col-sm-1 col-sm-offset-1">
##           <div class="col-sm-8">
##             <div id="scatterplot" class="shiny-plot-output" style="width: 100% ; height: 400px"></div>
##           </div>
##         </div>
##       </div>
##     </div>
##     <div class="tab-pane" data-value="Crosstab" id="tab-7778-2">
##       <div class="row">
##         <div class="col-sm-2"></div>
##         <div class="col-sm-5">
##           <div class="form-group shiny-input-container">
##             <label class="control-label" for="kpi1">Count Rank(low fence)</label>
##             <input class="js-range-slider" id="kpi1" data-min="1" data-max="158" data-from="5" data-step="1" data-grid="true" data-grid-num="9.8125" data-grid-snap="false" data-prettify-separator="," data-keyboard="true" data-keyboard-step="0.636942675159236" data-drag-interval="true" data-data-type="number"/>
##           </div>
##         </div>
##         <div class="col-sm-5">
##           <div class="form-group shiny-input-container">
##             <label class="control-label" for="kpi2">Count Rank (medium fence)</label>
##             <input class="js-range-slider" id="kpi2" data-min="1" data-max="158" data-from="20" data-step="1" data-grid="true" data-grid-num="9.8125" data-grid-snap="false" data-prettify-separator="," data-keyboard="true" data-keyboard-step="0.636942675159236" data-drag-interval="true" data-data-type="number"/>
##           </div>
##         </div>
##       </div>
##       <div class="col-sm-8">
##         <div id="crosstab" class="shiny-plot-output" style="width: 100% ; height: 400px"></div>
##       </div>
##     </div>
##     <div class="tab-pane" data-value="Bar Chart" id="tab-7778-3">
##       <div class="row">
##         <div class="col-sm-5"></div>
##         <div class="col-sm-7">
##           <button id="bar" type="button" class="btn btn-default action-button">Generate Bar Chart</button>
##         </div>
##       </div>
##       <div class="col-sm-8">
##         <div id="barchart" class="shiny-plot-output" style="width: 100% ; height: 400px"></div>
##       </div>
##     </div>
##     <div class="tab-pane" data-value="Histogram" id="tab-7778-4">
##       <div class="row">
##         <div class="col-sm-3"></div>
##         <div class="col-sm-5">
##           <div class="form-group shiny-input-container">
##             <label class="control-label" for="hist1">Binsize</label>
##             <input class="js-range-slider" id="hist1" data-min="0" data-max="5000" data-from="1000" data-step="1" data-grid="true" data-grid-num="10" data-grid-snap="false" data-prettify-separator="," data-keyboard="true" data-keyboard-step="0.02" data-drag-interval="true" data-data-type="number"/>
##           </div>
##         </div>
##         <div class="col-sm-8">
##           <div id="Histogram" class="shiny-plot-output" style="width: 100% ; height: 400px"></div>
##         </div>
##       </div>
##     </div>
##     <div class="tab-pane" data-value="Pie Chart" id="tab-7778-5">
##       <div class="row">
##         <div class="col-sm-5"></div>
##         <div class="col-sm-7">
##           <div id="radio" class="form-group shiny-input-radiogroup shiny-input-container">
##             <label class="control-label" for="radio">
##               <h3>Slect the Measure</h3>
##             </label>
##             <div class="shiny-options-group">
##               <div class="radio">
##                 <label>
##                   <input type="radio" name="radio" value="1" checked="checked"/>
##                   <span>Count of Rank</span>
##                 </label>
##               </div>
##               <div class="radio">
##                 <label>
##                   <input type="radio" name="radio" value="2"/>
##                   <span>Market Value</span>
##                 </label>
##               </div>
##             </div>
##           </div>
##         </div>
##       </div>
##       <div class="row">
##         <div class="col-sm-5"></div>
##         <div class="col-sm-7">
##           <button id="piechar" type="button" class="btn btn-default action-button">Generate Pie Chart</button>
##         </div>
##       </div>
##       <div class="col-sm-8">
##         <div id="piechart" class="shiny-plot-output" style="width: 100% ; height: 400px"></div>
##       </div>
##     </div>
##   </div>
## </div>
source("../04 Shiny/server.R",echo = TRUE)
## 
## > require("jsonlite")
## 
## > require("RCurl")
## 
## > require(ggplot2)
## 
## > require(dplyr)
## 
## > require(shiny)
## 
## > require(shinydashboard)
## 
## > require(leaflet)
## 
## > shinyServer(function(input, output) {
## +     df1 <- eventReactive(input$scatter, {
## +         data.frame(fromJSON(getURL(URLencode("skipper.cs.utexas. ..." ... [TRUNCATED]

All shiny graph codes are included here. We use grammar of graph and Data wrangling to create these graphs.


Visualization


We Start our project by following the steps in Methodology. We try to use different kind of Plots to discover interesting things. We generate Barchart, Crosstab, Histogram,Scatter Plot,Box Plot,Map,Pie Chart and Bubble to find some interesting facts.


Barchart allows us to see which industries take up most of the ranks


Crosstab displays number of ranked companies in every continent based on industry


This is a breakdown of % total profits and % total average profits for each industry


The boxplot visualizes the highest profits based on industry in each continent The following boxplot is the plot after we exclude some extreme outliers.(Computer Hardware in North America,Major Bank in Asia, NA, Australia and Central America).


Pie Chart shows market value of industries


This chart relates industry and market value by size


This map shows a state-by-state breakdown of notable ranked companies and their respective industry


This map shows the average profits among the ranked companies worldwide


This map shows median profits (in millions USD) in each country


Histogram shows concentration of overall profits in companies


Scatterplot shows logarithmic regression of profits and overall rank for each company